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Introduction 

In this report we summarize the efforts undertaken by 
the Center for Applied Mathematical and Statistical Research 
at Southern Methodist University in support of the contract 
NAS 9-16438 since January 31, 1983. For a discussion of the 
progress made on this contract prior to January 31, 1983, 

reference should be made to Final Report SR-63-04408. Our 
recent efforts have dealt primarily with an evaluation of 
current techniques for mixture model proportion estimation 
along with investigation into alternative techniques. 
Mixture modeling procedures currently utilized by NASA, e.g. 
CLASSY, assume a mixture of normal components. In addition, 
associated parameter estimation is accomplished using 
maximum likelihood (ML) methods based on the normality 
assumption since these ML estimators are optimal when the 
normality assumption is valid. However, it is well known 
that ML estimation procedures are highly sensitive to 
violations of the underlying assumptions. Recent 
implementation of the mixture model has involved use of 
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feature variables from the Badhwar profile model. In 
particular* the feature variables currently in use are T p - 
time of peak greenness, G(T p ) - peak greenness, and V - a 
measure of the length of the growing season. The normality 
of these feature variables has been an issue of recent 
concern. 

Qui results and investigations can be grouped into four 
major categories: 

(1) Further results on the comparison of normal based MLE 
and MDE (Cramer-von Mises distance) 

(2) Use of the Hellinger metric as an alternative to the 
Cramer-von Mises distance used in calculating the MDE 

(3) Investigation into the use of the Weibull as an 
alternative to the normal for modeling component 
distributions. 

(4) Implementation of the estimation procedures on LANDSAT 
data in an effort to: 

(a) investigate the normality (or non-normality) 
of the Badhwar feature variables. 

(b) compare the performance of the MLE and MDE in a 
"real data" situation. 

The progress which has been made in these areas is discussed 
in this report. 


(1) Normal-based MDE vs MLE 


One of our primary investigations has been the 
comparison of normal based MD and ML estimation of the 
mixture proportion for simulated two component mixture 
models. We have* compared the estimation procedures for 
simulated mixtures of normal and of non-normal components. 
Our investigations in this area were previously documented 
in NASA technical report SR-62-04376, In that report we 
showed ML procedures to be superior when the normal 
component assumption is valid while MD procedures perform 
better on the simulated mixtures of components which 
represent symmetric departures from normality. Mixtures of 
t(4) components were examined in that report. 

Our recent results have included more extensive 
simulations in which double exponential and t(2) components 
were examined. The double exponential was chosen since it 
has lighter tails than a t ( 4 ) yet heavier than a normal 
distribution. Tests for goodness-of-f it usually have little 
power in distinguishing normal and double exponential data. 
In addition, t ( 2 ) components were examined in order to 
compare the estimation procedures in a heavier tailed 
setting than the t(4). In fact, the t ( 2 ) distribution has 
infinite variance, and not surprisingly, realizations often 
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have a few extreme observations. Our results show that in 
both of these non-normal situations, the MDE provides better 
proportion estimates than the MLE. This improvement is 
particularly striking for the t(2) simulations as the MDE 
seems to be relatively insensitive to a few extreme values. 

In these simulations we have initiated the iterative 
routines used to calculate the MDE and the MLE with starting 
values obtained using a somewhat ad-hoc quasi-clustering 
technique. We have observed that in some situations, 
particularly those with heavy overlap between component 
distributions, the starting values perform better than both 
the MLE and MDE, an interesting finding since the starting 
value routine is easy to implement and is very fast since it 
does not involve iteration. , 

Asymptotic results have been obtained which establish 
the strong consistency and asymptotic normality of the MDE 
in the mixture-of-normals setting. The form of the 
asymptotic variance of the MDE is available from these 
results so that the asymptotic relative efficiencies (AREs) 
of the MDE relative to the MLE can be found. We have 
calculated these AREs for several parameter configurations. 
These AREs are fairly comparable to the empirical finite 
sample results. The following reports have been written 
since January 31, 1983 concerning our work in this area. 
Report [1] is included as Appendix A in this document. 
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[1] "Minimum Distance Estimation o£ Mixture Model 
Parameters - Asymptotic Results and Simulation 
Comparisons with Maximum Likelihood" by Wayne A. 
Woodward, William C. Parr, William R. Schucany, and 
Henry L. Gray (SR-63-04427) , June 1983. 

[2] "A Comparison of Minimum Distance and Maximum 

Likelihood Estimation of a Mixture Proportion" by 
Wayne A. Woodward, William C. Parr, William R. 

Schucany, and Hildegard Lindsey - submitted to 

Journal liue Mexican S ta ti st ical 


.(2) Minimum Hellinger Distance Estimation 

# 

We have also investigated the use of the Hellinger 
metric for calculating the MDE. In our previous work, the 
Cram£r-von Mises distance has been used exclusively for this 
calculation. The minimum Hellinger distance estimator (MHDE) 
is of interest to aerospace remote sensing since it has the 
potential of providing robust proportion estimates under 
deviations from normality while maintaining performance 
comparable to the MLE when the underlying components 
actually are normal. However, our initial implementations of 
this procedure have shown that although the results are 
encouraging, the iterative procedure is highly sensitive to 
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starting values and is computationally more difficult than 
the MDE based upon Cramer-von Mises distance. More 
investigation is needed before the MHDE can be considered to 
be a viable alternative for proportion estimation. 

As a result of our investigations of the MHDE# the 
following report has been written and is included here as 
Appendix B. 

[3] "Minimum Hellinger Distance Estimation of Mixture 
Model Parameters" by Wayne A. Woodward and Paul W. 
Eslinger (SR-63-04433) , July 1983. 

(3) Weibull Based MD Estimation 

The MD and ML estimation procedures discussed in the 
previous sections were both based upon a mixture of normal 
components. Our results showed that although the MDE is more 
robust to symmetric departures from component normality, 
neither normal based procedure provided adequate results in 
the presence of asymmetric departures. We have investigated 
the use of the Weibull distribution as an alternative to the 
normal since the Weibull can be symmetric or skewed {to the 
right or to the left), and it therefore provides a very 
flexible model. The density function for the three parameter 
Weibull is given by 
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Y-l - (£J±) 

f (x) - f (*—■> e ^ , x > a (1) 

where $>0 and y > 0. The mean and variance are given 

* 

by i 

U * a + $r(i + 1) 

a 2 - 3 2 {r(| + 1) - I i2 (i + 1)}. 

The parameter y serves as a shape parameter. When y=3.6 the 
Weibull is symmetric and in fact, quite similar to the 
normal distribution. The Weibull is skewed to the left or to 
the right depending on whether y>3.6 or y<3.6 respectively. 
The following technical report addresses the use . of the 
Weibull in mixture proportion estimation. It is included 

t 

here as Appendix C. 

[4] "Proportion Estimation in Mixtures of Asymmetric 
Distributions" - Wayne A. Woodward, Richard P. Gunst, 
Hildegard Lindsey, and H.L. Gray. 

(SR-63-04409) , May 1983. 

2 

The x (9) was used in the simulations in report 14] to 
assess the effect of component asymmetry. In that report, 
the iterative procedures were started at "truth" rather than, 
at starting values obtained from the data. In Table 1 we 
present the results of a recent and more extensive set of 
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simulations than those quoted in [ 4 ] . In the new 

simulations, starting values for u . , c? , and p, i*l,2 were 
obtained from the data as discussed in [2]. The starting 
values for and were then converted to estimates of 
and 3^ using equations (2) with y^* 3.6, i«l,2. The starting 
value estimate for p remains unchanged. The simulations 
summarized in Table 1 are based on simulated mixtures of 
normal components while those in Table 2 are based on 
simulated mixtures of x 2 (9) distributions. As in [4] we 
examined overlaps, as defined in 12], of .10 and .03, and 

mixing proportions of .25, .50, and .75. We have added the 

case in which the variance of component 1 is twice that of 
component 2. In this table we compare the MLE based on a 
mixture of normals model (MLEN), the MDE based on a mixture 
of normals model (MDEN) , and the MDE based on a mixture of 
Weibulls model (MDEW) . 

The results here are similar to those shown in 

SR-63-04409 in that the normal based procedures performed 
better on the mixtures of normal components while the 
Weibull based MDE was generally superior on mixtures Of 
X 2 < 9 ) components. Again, the starting value routine obtained 
estimators which were competitive with and often better than 
those estimators obtained through the iterative routines. 

A few other comments are in order here. First, we 

believe that if the asymmetry can be assumed to be in only 
one direction (probably to the right for the profile 
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Table 1. Comparison of Proportion Estimation Techniques 
Simulated Mixtures of Normal Components 
n - 200 

number of repetitions » 100 





1 

z 




Overlap - 

.10 

Overlap « 

.03 


A 

a 

A 

A 

A 

A 


P 

Bias 

MSE 

P 

Bias 

MSE 

MLEN 

.27 

.02 

.023 

.26 

.01 

.002 

MDEN 

.31 

.06 

.045 

.27 

.02 

.005 

MDEW 

.33 

.08 

.042 

.33 

.08 

.015 

Starts 

.30 

.05 

.009 

.29 

.04 

.005 

MLEN 

.50 

.00 

.020 

.49 

-.01 

.002 

MDEN 

.50 

.00 

.019 

.49 

-.01 

.002 

MDEW 

.49 

-.01 

.028 

.49 

-.01 

.005 

Starts 

.51 

.01 

.007 

.50 

.00 

.005 



i 

Overlap - 

2 

°1 " 

.10 

la\ 

Overlap » 

.03 


A 

A 

A 

A 

A 

A 


P 

Bias 

MSE 

P 

Bias 

MSE 

MLEN 

.24 

-.01 

.011 

.24 

-.01 

.002 

MDEN 

.31 

.06 

.031 

.25 

.00 

.005 

MDEW 

.36 

.11 

.082 

.25 

.00 

.011 

Starts 

.23 

-.02 

.007 

.25 

.00 

.003 


MLEN 

.49 

-.01 

.012 

.50 

.00 

.002 

MDEN 

,50 

.00 

.016 

.50 

.00 

.002 

MDEW 

.49 

-.01 

.027 

.50 

.00 

.007 

Starts 

.41 

-.09 

.015 

.45 

-.05 

.007 


MLEN 

.70 

-.05 

.025 

.74 

-.01 

.002 

MDEN 

.64 

-.11 

.057 

.73 

-.02 

.004 

MDEW 

.59 

-.16 

.057 

.71 

-.04 

.009 

Starts 

.59 

-.16 

.035 

.66 

-.09 

.012 


Table 2. Comparison of Proportion Estimation Techniques 

2 

Simulated Mixtures of x (9) Components 
n * 200 

number of repetitions ■ 100 


25 


50 


75 



Overlap * 

2 2 
°1 “ °2 

.10 

Overlap ■ 

.03 


A 

A 

A 

A 

A 

A 


P 

Bias 

MSE 

P 

Bias 

MSE 

MLEN 

.27 

.02 

.096 

.17 

-.08 

.007 

MDEN 

.34 

.09 

.106 

.17 

-.08 

.008 

MDEW 

.35 

.10 

.049 

.32 

.06 

.011 

Starts 

.32 

.07 

.035 

.27 

.02 

.003 


MLEN 

.26 

-.24 

.062 

.41 

-.09 

.011 

MDEN 

.29 

-.21 

.058 

.42 

-.08 

.009 

MDEW 

.42 

-.08 

.023 

.49 

-.01 

.005 

Starts 

.48 

-.02 

.007 

.51 

.01 

.004 


MLEN 

.48 

-.27 

.080 

.65 

-.10 

.013 

MDEN 

.46 

-.29 

.095 

.63 

-.12 

.016 

MDEW 

.55 

-.20 

.075 

.71 

-.04 

.006 

Starts 

.67 

-.08 

.013 

.66 

-.09 

.013 


Overlap 




Overlap * .03 


= .25 


■ .50 



A A 


A 


A 



P 

Bias 

MSE 

P 

Bias 

MSE 

MLEN 

.16 

-.09 

.050 

.19 

-.06 

.006 

MDEN 

.28 

.03 

.069 

.18 

-.07 

.008 

MDEW 

.37 

.12 

.062 

.31 

.06 

.010 

Starts 

.26 

.01 

.026 

.25 

.00 

.003 


MLEN 

.28 

-.22 

.053 

.43 

-.07 

.007 

MDEN 

.31 

-.19 

.041 

.43 

-.07 

.007 

MDEW 

.45 

-.05 

.015 

.49 

-.01 

.003 

Starts 

.41 

-.09 

.015 

.45 

’ -.05 

.007 


MLEN 

.46 

-.29 

.089 

.65 

-.10 

.012 

MDEN 

.50 

-.25 

.076 

.64 

-.11 

.016 

MDEW 

.58 

-.17 

.049 

.69 

-.06 

.008 

Starts 

.60 

-.15 

.030 

.61 

-.14 

.023 
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variables under consideration) , then the estimation results 
shown in Tables 1 and 2 can be improved. Another interesting 
finding was made during these simulations concerning the 
3-parameter Weibull. Although we only show summary results 
in this report, parameter estimates for all 7 of the 
parameters of the fitted mixture-of-Weibulls is printed out 
by the simulation program for each sample generated. For 
several samples «(<0) and 3 were very large in absolute 
value, sometimes greater than 1000. These parameter values 


were associated with a 

Y smaller in 

magnitude than 

a 

and 

8 but substantially 

larger than 

3.6. Although 


these 

parameter values appear 

to be "very 

bad", plots 

of 

the 

associated 3-rparameter 

Weibull densities showed 

to 

be 


consistent with the data with only very small probability 
being associated with the interval between a and 0. In 
Figure 1 we show two 3-parameter Weibull densities, one 
associated with parameters a = -1182, 3 = 1205, and y = 324 
while the other density has parameters a = 0, 8 = 21.2, 
y = 5.8. We see that the densities are very similar although 
the parameter values differ dramatically. Thus, the 
3-parameter Weibull seems to suffer from a "practical 
non-identif iability" which may or may not be a problem in 
our setting. If only proportion estimates are desired, then 
this lack of "identif iability " of the component Weibulls may 
not cause difficulties. It is clear that the component 
Weibull parameter estimates can be very misleading. 
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(4) LANDS AT Data 

The relative performance of the MDE and MLE has been 
examined through extensive simulation investigations. These 
investigations have been an important first step in 

understanding the behavior of the estimators in controlled 
normal and non-normal mixtures. It has been shown that lack 
of component normality can severely degrade the performance 
of the MLE. In fact/ we have seen that mild non-normality 
{double exponential) can cause the "optimal" MLE to perform 
in a less than optimal manner. Another key concern involves 
the symmetry of component distributions since our 

simulations have shown that such skewness can have adverse 
effects on normal based procedures. 

The performance of the estimation schemes on LANDSAT 
data is/ of course/ of ultimate importance. The key 
questions which are of interest in this respect are; 

(a) Are the feature variables from the Badhwar profile 
model normal? If not, what type of non-normality is 
encountered? 

(b) How do the estimation procedures compare on this data? 

In an attempt to provide answers to these questions, we 


have utilized data from the Fundamental Research Data Base. 
This data base consists of eighteen segments on which ground 
truth and the Badhwar feature variables are available for 
each pixel. In cur investigations we identified the pure 
pixels on each segment and related these back to their 
ground truth labels. Our simulation investigations have been 
based on mixtures of two univariate component distributions. 
Therefore, the current interest concerns the ability of the 
estimation procedures to estimate crop proportions in this 
univariate, two component, real data setting. Accordingly, 
we identified "pairs" of crops from these 18 segments for 
which proportion estimation would be useful. That is, from a 
given segment we identified two crops, say corn and 
soybeans, and considered the related pixels to constitute a 
mixture population. In an attempt to further understand the 
data for these mixture populations, histograms of the 
component distributions and of the mixture distribution were 
drawn for each of the three feature variables T p , V, and 
G(T ). In Figures 2-7 we display these histograms for the 

It 

corn and soybean pure pixels of Segment 1380, a 1978 
Minnesota segment. Several observations can be made 
concerning the histograms. First, there is clear visual 
separation between corn and soybeans on the basis of G(T ), 
a small amount of separation on T , and no separation on V, 
Notice that what appears to be a second peak in the mixture 
model for V in Figure 5 appears as a "spurious" peak in the 
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Figure 5. Mixture Histogram based on V for Corn 
and Soybean Components of Figure 4 
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Figure 6. Histograms of G(T p ) for Corn and Soybean Component 
Distributions based on Pure Pixels from Segment 1380 
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component distribution for soybeans (see Figure 4 ), This 

leads to a second observation which is that the quality of 

the data is very questionable. The peak in the soybean 

component should be explained. Further, the figures indicate 

that outliers are a major problem. For example, note should 

be made of the extreme values for each profile, particularly 

for soybean components. In order to correctly analyze these 

data, the outliers must be more fully understood. Outliers 

could arise from several sources. Among these are 

incorrectly specified ground truth readings, crops which 

were plowed under after the ground truth readings were made, 

and extreme values which result from instability of 

parameter estimation in the Badhwar model. Our examination 

* 

of all of the histograms reveals that outliers are in 
general most prevalent for V . We do not at present 
understand the outliers observed here, but their magnitude 
is significant to warrant further investigation. 

Although the mixtures displayed in Figures 3 and 7 are 
bimodal, a general impression after examining all of the 

4 

histograms is that for many of the crop comparisons, the 
mixture histograms are not bimodal for any of the profile 
variables. This, of course, causes the usefulness of the 
profile variables for separating crops to be questionned. 
Based upon our examination of the data, we are able to make 
some very general comments concerning the crop separation. 
For the segments we observed, none of the three variables 


produced histograms from which a separation was visible when 
comparing : 

grass vs. spring small grains 

spring wheat vs. other spring small grains 

spring wheat vs. spring barley 

corn vs. trees 

grass vs. pasture 

In contrast, visual separation was present for the following 
comparisons: 

corn vs. soybeans (T and G(T )) 

P P 

cotton vs. spring small grains (Tp and G(T p )) 
sunflower vs. spring wheat (T p and G(T p )) 
pasture vs. alfalfa (G(T p }) 

Of course, multivariate examinations of these variables 
might detect separations which we are unable to observe in 
the univariate setting. 

We also examined the performance of the estimates 
studied in the simulation studies on the LANDSAT data. In 
order to do this we sampled from the mixture populations 
described earlier. Specifically, for selected "crop pair 
populations" we selected 100 samples of size n=200, 
obtaining the MDEN, MLEN, and MDEW for e.ach sample. The 
results of this "data simulation" were then summarized in 
much the same way as were the simulations presented earlier. 
In Table 3 we present the results for estimating the mixing 
proportion based upon the corn-soybean mixture from Segment 
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1380. The ground truth proportion there is p«.43 (proportion 
of pixels in the mixture which are corn) . Prom the table we 
see that the estimation results are very poor for all 
estimation procedures. Examination of the histograms (see 
Figures 2-7) reveals that, the outliers discussed earlier 
are probably the major cause for this poor performance. The 

starting value results deserve special attention. The 

* 

starting values are restricted to p*.l, ,2, . . . , .9, and they 

are selected in such a way that if fewer than 5% of the data 

are "extreme" in either direction, then this has very little 

effect on the starting values. With outliers as extreme and 

as numerous as the ones in the present data, the starting 

value routine often interprets the extreme 10% of the data 

as constituting a component. Thus we see the extremely poor 

* 

starting value results in Table 3. In an effort to examine 
the effect of the outliers on the results in Table 3 we 
truncated the most extreme observations, and repeated the 
simulations. In particular, all T p observations below 60 and 
above 150 were truncated, all V observations above 80 were 

i 

truncated, and for G(T p ), all observations below 10 and 
above 120 were truncated. These truncations were performed 
independently for each variable so that the ground truth 
proportions differ from profile to profile. These ground 
truth readings are given in Table 3. A truncation based on 
all three criteria together might be of interest since, for 
example, the spurious peak at about V =53 for the soybean 


Table 3. Results of "Data Simulation" based on Corn and 
Soybean Pure Pixels from Segment 1380 


Sample Size ■ 200 
Number of Replications - 100 


Data Not Truncated 

Ground Truth p ■ .43 (Proportion Corn) 


T V G(T ) 

P P 

A A A A A A 



P 

MSE 

P 

MSE 

P 

MSE 

MDEN 

.59 

.07 

.64 

.05 

.63 

.10 

MLEN 

.83 

.17 

.77 

.13 

.89 

.23 

MDEW 

.73 

.15 

.62 

.08 

.67 

.11 

Starts 

.89 

.22 

.85 

.18 

.79 

.15 


Data Truncated 


Ground Truth Ground Truth Ground Truth 

p * .44 p ■ ,.43 p *> .45 

A A A A A A 



P 

MSE 

P 

MSE 

. _._P_._. 

MSE 

MDEN 

.47 

.04 

.61 

.04 

.58 

.03 

MLEN 

.71 

.09 

.64 

.06 

.44 

.05 

MDEW 

.49 

.09 

.57 

.05 

.54 

.02 

Starts 

.88 

.20 

.83 

.16 

.61 

.03 
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data in Figure 4 may be associated with extreme values of 
one of the other two variables, and thus these values of V 
would be removed by such a joint truncation procedure. The 
results of the simulations on truncated data are given in 
Table 3. There it can be seen that the performance of the 
estimators improves dramatically. However, it should be 
noted that extreme observations seem to continue to have an 
effect on the results. Notice that G (T ) appears to be the 

IT 

best single variable for separating corn and soybeans, in 

which case MDEW results are superior. Simulations similar to 

those reported here were obtained for several crop-pair 

mixtures. In general, although visible separation sometimes 

existed between the two components, estimation results were 

usually very poor because of the outliers. 

# 

The symmetry of the component distributions is one of 
our main interests. However, the outliers tend to diminish 
our ability to examine skewness. Although many of the 
component distributions appeared to be nearly symmetric, we 
have observed skewness to the right in several cases, see 
for example the component distributions for V in Figure 4. 

Many of the comments which are made here are based on, 
our examination of all of the histograms and "data 
simulation" results which were obtained from our processing 
of the segment data. Although these displays and results 
cannot all be included here, we believe that the ones we 
have presented are sufficient to provide an understanding of 
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the data. The histograms and data simulation results have 
been provided to Dr. Dick Heydorn at Johnson Space Center. 

Summary 

The results of our investigations have provided new 
insight into the role of non-normality and the performance 
of the MLE presently used for crop proportion estimation. In 
addition we have examined several alternatives to the 
normal-based MLE for estimating mixing proportions. We 
believe, however, that further research is needed in this 
area. In particular, the extension of the investigations to 
situations in which more than two components are present 
would be a natural next step. Further extensions to the 
multivariate case also seem to be of importance. 

The MHDE appears to have some real potential as an 
estimator due to its efficiency under normality. However, 
much work is necessary before it can be determined whether 
or not it is a viable alternative. 

The role of symmetry of the component distributions and 
the performance of the estimation procedures still requires 
examination. In particular, if the asymmetry can be assumed 
to be in only one direction (probably to the right) then we 
believe that the estimation results shown in Table 2 can be 
improved. The practical importance of the 
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"non-identifiability" observed in the 3-parameter Weibull is 
not yet fully understood. In addition, possible new 
alternatives to the Weibull and normal component models 
considered to date should be considered. 

The simulation results concerning the performance of 
the simple starting value routine we developed imply that 
further research into its capabilities is warranted. 

Finally, the examination of the estimation procedures 
on LANDSAT data is only in its initial stages. The problem 
with ouliers and how best to deal with them is a very 
important question related to the implementation of these 
techniques on LANDSAT data. Although the MDE procedures 
examined in our investigations are relatively insensitive to 
outliers, the magnitude and quantity of outliers present in 
the data we observed had very deleterious effects on all 
estimation procedures examined. 
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MINIMUM DISTANCE ESTIMATION OF MIXTURE MODEL PARAMETERS - 
ASYMPTOTIC RESULTS AND SIMULATION COMPARISONS 
WITH MAXIMUM LIKELIHOOD 


Wayne A. Woodward, William C. Parr, 

William R. Schucany, and Henry L. Gray 

1. Introduction 

An important problem in aerospace remote sensing is the 
estimation of the mixing proportions * n the 

mixture density 

f(x) ■ P 1 f 1 (x) + p 2 f 2 (x) + ... + P m f m (x) 

where m is the number of components (crops) in the mixture 
and for component !,■£ (x) is a density. The variable of 
interest, X, is some measurement such as the reflected 
energy in four bands of the light spectrum as measured by 
the LANDSAT satellite, certain linear combinations of these 
readings, or other derived "feature" variables. 

Generally, parameter estimation in mixture model 
applications has been accomplished by assuming that the 
component distributions are normal and using maximum 
likelihood (ML) techniques. In a recent report, Woodward, et. 
al. (1982) have examined the use of minimum distance(MD) 
estimation based on the Cramer-von Mises distance, as an 
alternative to maximum likelihood. Both ML and MD estimation 
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schemes in that 
univariate normal 
given by 

f (x) - 

/2rr 


paper were based upon the mixture of 
distributions whose density function 
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where all 5 parameters Pi, o 1 , u 2 , o 2 , and p are unknown. It 
was also assumed that no training data are available, i.e., 
the only observations are from the mixture distribution. In 
this setting, motivated by the crop example, p is the 
parameter of paramount importance while location and scale 
of the components are nuisance parameters. Woodward, et. al. 
(1982) compare ML and MD estimation techniques on simulated 
mixtures of normal, t(4), and chi-square (9) densities with 
varying amounts of separation. The results indicate that the 
MDE is more robust than the MLE to. symmetric departures from 
component normality, while neither technique provides 
satisfactory results when component distributions are 
skewed. 

In this report, we present further simulation results 
comparing ML and MD estimation of the mixing proportion 
based on a mixture-of-normals model, when in fact the 
component distributions are not normal, yet represent 
symmetric departures from normality. Unless otherwise 
indicated, reference to the MDE in this report will involve 
the use of Cramer-von Mises distance. We also present 
asymptotic results which establish the strong consistency 
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and asymptotic normality of MD estimators of the parameters 
in the mixture-of-normals model, and finally provide 
asymptotic relative efficiencies for comparing the MLE and 
MDE in this setting. 

2. Simulation Results 

In this section we report the results of a Monte Carlo 
study designed to compare the ML and MD estimators based 
upon a mixture-of-normals when the simulated component 
distributions are normal and when they are non-normal. These 
comparisons are made under varying degrees of separation 
between the two component distributions. All computations 
were performed on the CDC 6600 at Southern Methodist 
University. 

In these simulations, the mixing proportion, p, takes 
on the values .25, .50, and .75. For a given mixture, the 
component distributions differ from each other only in 
location and scale. In particular, f^fx) is taken to be the 
density associated with a random variable X=aY while f 2 (x) 
is the density for X=Y+b where a>0, b>0. Thus, a is the 
ratio of scale parameters for the densities ti and f 2 , and 
similarly, b is the difference in location parameters. The 
random variable Y in our simulations is either normal, 
Student's t with 2 or 4 degrees of freedom, or double 
exponential. In our simulations we use a=l and a» /2 while b 
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is selected to provide the desired separation between the 
component distributions. The number of modes of the mixture 
density depends to a large extent on this separation between 
the two component distributions. Although, for sufficient 
separation, the mixture model has a characteristic bimodal 
shape, the density may by unimodal when there is only 
moderate separation between the components, and in this 
case, parameter estimation is more difficult than it is in 
the bimodal cases. For purposes of quantifying this 
separation between the components, a measure of "overlap" 
between two distributions was defined by Woodward et. 
al. (1982) . 

For each set of parameter configurations, 500 samples 
of size n=100 were generated from the corresponding mixture 
distribution. Simulations were based on the IMSL 
multiplicative congruential uniform random number generator 
GGUBS. Normal component observations were generated using 
IMSL subroutine GGNPM which uses the polar method, while 
t(n) observations were based on the ratio of independent 
chi-square and normal deviates, each obtained using IMSL 
routines. Double exponential components were based on ln(U) 
where U is uniform(0,l) , and randomly assigning either a 
positive or negative sign. In all cases, observations from 
the basic component distribution under investigation were 
simulated and then assigned to either component 1 or 
component 2 depending upon whether an independent 


uniform(0,l) was less than or greater than p. The 
observations were then scaled and shifted (with a and b) to 
provide observations from the appropriate component. 


For each sample simulated/ both the MDE and MLE were 
obtained. The iterative procedures discussed by Woodward et. 
alo (1982) were implemented in such a way that acceptable 
parameter estimates are obtained for each sample. For 
example/ if the iterative procedure fails to converge in the 
specified number of iterations, the last value obtained in 
the iteration is taken to be the estimate if this value is 
"reasonable" according to preset criteria. In general, if 
any of the following conditions existed at any step in the 
iteration, 

A 

> Y n - (= sample range) 

-v 

- Y n" Y l 

*2 >5r „ + -fr 1 


iteration is terminated and the corresponding estimate is 
taken to be the starting value. This did not occur in any of 
the 500 repititions, for most configurations, but did occur 
a maximum of 7 times out of 500 for MD estimates of the 
parameters of a mixture of t(2) components. The extreme 
observations which occasionally appear in samples from t(2) 
mixtures, also forced a modification in the first step of 
the MLE iteration to avoid a division by zero. Although both 


6 


estimation procedures provide estimates of all 5 of the 
parameters, only the results for estimation of p will be 
tabulated since the mixing proportion is the parameter of 
primary interest, as previously mentioned. In addition, when 
dealing with the non-normal mixtures, the remaining 
parameter estimates often do not have a meaningful 
interpretation. 

In Table 1 we present summary results of the 
simulations comparing the performance of the MLE and MDE for 
mixtures of normal components while in Table 2 we display 
the results for the non-normal components. The results for 
normal and t(4) components were previously given in Woodward 
et. al. (1982) . Estimates of the bias and MSE based upon the 
simulations are given by: 

1 > * 

Bias « — l (p^p) 
s i=l 


and 

* n s 

mse * -jp l (p.-p) 2 

s i»l 

A 

where n s is the number of samples, and p^^ denotes an 
estimate of p for the ith sample. It should be noted that 
nMSE is the quantity actually given in the tables since this 
facilitates comparison with asymptotic variances in 
Section 4. Since the MLE and MDE are both asymptotically 
unbiased (this will be discussed for the MDE in the next 
section) , n mse/o 2 is approximately x 2 (5Q0). It is easy to 
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Tabic 2, Simulation Results for Mixtures 
of Non-normal Components 

Sample size ■ 100 

Number of replications ■ 500 

Double Exponential Components 


Overlap ■ . 10 


of Scale 


MDE 

Factors (a) 

Bias 

nMSE E Closer 


MDE . 

MLE 

Start 


2.96 2.13 .66 

6.31 

1.40 


1.03 4.04 .69 

4.16 

1.17 


4.42 1.40 .60 

6.17 
.926 


.25 1 


.75 /2 


MDE 

MLE 

Start 


MDE 

MLE 

Start 


MDE 

MLE 

Start 


MDE 
MLE 
St 
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__E 

Start 


t(4) Components 


6.18 1.19 


1.82 3.07 


V 

w 

MLE 

Start 


-.058 

-.076 

-.137 


1.80 2.77 


3.68 2.13 

7.84 
3.07 


035 

.7 

037 

.8 
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show then, that the approximate standard error of a tabled 
nMSE is ( . 0632) (nMSE) • In addition, we also provide the 
ratio 

MSE(MLE) 

t m A 

MSE (MDE) 

as an empirical relative efficiency measure. 

In order to take advantage of the paired nature of our 
ML and MD estimates, we counted the proportion of samples 

A AA A 

for which p D is closer to p than is p L , where p D and p L 
denote the MD and ML estimates respectively. We present this 
proportion in the tables under the heading "MDE Closer 1 '. 


This provides 

an 

estimate of 

p < |p d -p! <Ip l -p| > 

• 

The 

standard error 

9 

of the binomial 

proportions shown 

in 

the 

tables is no greater 

than y XiALL. 
7 500 

^ - .022. 



Analyzing 

the 

results, and as can be 

seen 

by 

inspection, we 

find 

that the 

estimated Bias 

and 

MSE 


associated with the MLE were generally smaller than those 
for the MDE when the components were actually normally 
distributed. This relationship between the estimators held 
for both overlaps. The MLE and MDE were quite similar at 
p=.5 while for p=. 25 and p=.75 the superiority of the MLE is 
more pronounced. 

For the mixtures of non-normal components, the 
relationship between MDE and MLE is reversed in that the MDE 
generally has the smaller estimated Bias and MSE, especially 
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for t(2) mixtures. The superiority of the MDE is due in part 
to the heavy tails in these components. The MLE often 
interpreted an extreme observation as being the only sample 
value from one of the populations with all remaining 
observations belonging to the other. Due to the well known 
singularities associated with a zero variance estimate for a 
component distribution, Day(1969), we were concerned that 
the observed behavior of the MLE was due to the fact that 
the variances were not constrained % away from zero. 

However, simulation results in which equal variances were 
assumed (which removes the singularity) and also those that 
used a penalized MLE suggested by Redner(1980) were very 
similar to those quoted here. 

A surprising result which was previously noted by 
Woodward et. al. (1982) is that the starting values obtained 
using the procedure outlined in Section 3 produced , 

estimators that were competitive with both the MLE and MDE. 
For both the normal and non-normal mixtures, the MSEs 
associated with the starting values were generally lower 
than those for the MDE and MLE when overlap®. 10. However, 
when, overlap®. 03, the starting value estimates were 
generally poorer than those for the MDE and MLE, except for 
the t(2) mixtures for which the MLEs were the poorest. 
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3. Asymptotic Distribution Theory for Minimum 
Cram^r-von Mises Distance Estimation 

»"vniptotic theory for minimum Cramer-von Mises distance 
estimators for location parameters can be found in Parr and 
Schucany (1980) , and for the general one parameter case in 
Parr and de Wet(1981). Bolthausen (1977 ) gives results for 
the mutiparameter case, but with conditions which are so 
strict as to rule out scale parameters for unbounded random 
variables (see his condition III). The purpose of the 
results in this section is to extend this previous work to 
cover multiparameter situations including, among others, the 
problem of normal mixtures. 

Assume that at stage n we observe real-valued , 

X,,...,X n iid from a distribution with cdf G and let G fl 

denote the usual empirical distribution function. Let 

dT»{p Q :9e0CR }, the projection model, be a family of 

continuous distribution functions and assume that G^l, 

i.e., G=F. for some 0 n eG . Further, assume that there 
9 q 

exists an open set AGO with 9 q£A • A ^ so consider the 
following continuity (C) and dif f erentiability (D) conditions: 


(C) If 0 eG, n = 1,2,.... then 
n 


lim / (Fq (x) - F. (x)) 
n-H» -08 n 0 

implies lim 0„ » 0 A . 

n-~ n 0 



(x) ■= 0 

0 


13 


(D) There exists a function n; (0,1) •* R such that 

sup i Fg (x) -Fq (x) - (0-0 o )'n(F 0 (x))! =0(1 f 9-© 0 I I ) 
-•<x<* 5 0 0 

as l|e-0 Q || 0, where ||*|i is the usual Euclidean 

. 1 2 

norm on R K , and / n- (u)du < ® for i ~ l,2,...,k where 

0 1 

n'(u) = ( (u) , n 2 (u) , . . n k (u) ) . 


Notes: 

1) Condition C is satisfied if, for instance, F Q ( x) is 

continuous in 9 at 8 Q , pointwise in x (use dominated 

convergence) * It can be interpreted as requiring that 0 
"continuously parametrize 

2) If condition C is not satisfied, then this implies 
sup IF. (x) -F (x) t can be arbitrarily small without having 6 

-»<X<® 3 “q 

approach 6 Q . In such a case, the search for any consistent 
estimator seems hopeless. In particular, in such a 

situation, any consistent estimating functional must be 
discontinuous with respect to the sup-norm, and hence highly 
nonrobust. 

3) Condition D is weaker than (implied by) quadratic 

1/2 

mean differentiability of f 0 - the canonical regularity 

condition for asymptotic normality of the maximum likelihood 

estimator (see LeCam (1970) and Pollard (1930)). 

3F e (x) 

4) Usually, Hj^) 3 “'§'§■■ — and condition D simply 

* X-Fg (u) 
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states the uniform validity of the first order Taylor 
approximation to F Q (x) . If k^l and 9 is a location 
parameter, a sufficient condition to imply D) is that Fg 
possess a uniformly continuous density. 

Before continuing define the kxk symmetric matrices A 
and B by 

A * {a^} , B * (b^} 

1 

with a. • = / ru (u)n . (u)du 
0 1 ■' 

1 1 

and b. . * / / {min (u, v) - uv) n • (u) ri . (v) dudv 
0 0 


and assume A to be of full rank. We can now state and 
outline the proof of the following- strong consistency and 
asymptotic normality results. 

Theorem 1: Lets, be a minimum distance estimator of 9 for 
n 

all n=l , 2, ... . Then, if condition C holds, 9 n 6 Q with 
probability one. 


Proof: Clearly, / (G n -Fg ) dF Q -*• g with probability one, 

9 9 

and hence also inf / (G -F.^dF,, ■+■ 0 with probability one, 
0e0 n 9 9 

Now, 

sup|/(G n -F 6 ) 2 dP 9 - J(F-F 9 ) 2 dF 0 | < 4 sup|G n (t)-F e !t) 

6 —<t<« 0 


with probability one. Hence, 

0 "n ~n ~n v 0 


/<F e -F 0 > 2 dF s = J(F e -F 8 ) 2 dF 9 
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with probability one, and strong consistency of 6 n follows 
from the assumption. 

Theorem 2 ; Assume conditions C and D and that A is of full 
rank. Then, if f Q (x) is continuous in 0 at e Q for every x, 

Jn (9 n -9 Q ) N(0, A-W 1 ). 

Proof . (Sketched) 

Set 

• K n< 5) ■ (G n' F 9 0 +5//n ) 2dF e 0 +;/»n for « E **• 

Then we have 

K n (5) - n/(V p 0 o - (F e o *5//r.- P 6 o ' )2ar e o 

+ »/ (G n -P, o - <F V5// - n -r e() > ) 2 d [F e()+;// - n -F 6o ] 

« o p (l> + / x ( n n (t) - C'ntt) - R n (t)) 2 at, 

uniformly in Z for Z' 5 C, for any C < «, where 

sup j R ( t ) | -*■ 0 with probability one, also uniformly in Z 
0<t<l n , 

for Z’Z < C. Here, U ft (t) = /n (G n (F‘ x ( t) ) -t) . 0 < t < 1. 

By an extension of the argument of Pyke (1970, p. 29-30) to 

the present context, we obtain that the limiting law of the 

random variable minimizing K n (5) over Z is also that of the 

value minimizing 

/ 1 (B ( t) - Z 'n (t) ) 2 dt, 

0 

where B is a Brownian bridge. The result then follows 
immediately. 

It can be shown that the mixture of normals model satisfies the 
conditions of both Theorem 1 and Theorem 2. 
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4. Asymptotic Relative Efficiencies 

Theorem 2 of the previous section indicates that for 
the mixture-of-normals model, we have 

/n {8 -9J sC n(0,a" 1 RA’ 1 ) , 
n u 

2 2 

where ^ = (y^, o^, u 2 , a 2 , p ) and 9 n is the vector of 

corresponding MD estimators using Cramer-von Mises distance. 
Likewise, it is well known that 

v'n (e L -e 0 )^ , 

A 

where is the MLE of 8 Q and K8 q) is Fisher's information 
matrix. We will employ the usual terminology and refer to 
and I(0 Q ) as asymptotic variance - covariance matrices 
and to their diagonal elements as asymptotic variances of 
the corresponding estimators. In this section we will 
present computed asymptotic variances for the MDE of p, 

A 

which is denoted by pjy and compare these with the asymptotic 

A 

variances associated with the MLE, denoted by Pl • 

The components of the matrix A were evaluated using the 
expression 
00 

/ (X)f 0 (x)dx , 

where F 0 (x) and f 0 (x) denote the distribution function and 
density function respectively for the mixture, ,is the ith 
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component of 6 , and 


5 i (x) 


iF 8 (x) 

36 . 

i 


This integral was evaluated using IMSL subroutine DCADRE 
which employs Romberg extrapolation to perform numerical 
integration of an integral over a finite interval. In our 
implementation, we used DCADRE to evaluate the integral 

. J 

^(x) 5j(x) f Q (x)dx, 

U 

where L=min( -100^+^,-1002 + 1^2) and U=max(10o 1 + M 2 _, 10 o 2 + U 2 ) 
with maximum allowable absolute error specified as 

jc —12 

1.0 X 10 and relative error of 1.0 X 10 . The double 

integral 
00 00 

j {F e (min(x,y) - F 0 (x) F Q (y) K . <xK j (y) f Q (x) f 0 (y) dxdy 
— 00 —00 

involved in calculating the elements of the matrix B is 
approximated by using IMSL subroutine DBLIN to perform a 
Romberg integration of the integral 

u u 

| | {F 0 (min (x,y) - F Q (x) F 0 (y) } (x) C j (y) f 0 (x) f Q (y) dxdy 

L L 


with maximum allowable absolute errror specified as 
1.0 X 10" 9 . 


The calculation of the information matrix for the 
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mixture-of-normals model is discussed by Behboodian (1972) . 
We have followed Behboodian's procedure and used 
Gauss-Hermite quadrature to approximate the integrals 
involved. Using 48-point quadrature we obtain good agreement 
with Behboodian* s tabled results. 

A 

In Table 3 we display the asymptotic variances for p D 

A 

and p L along with asymptotic relative efficiency (ARE) 
calculated as 

A 

asymptotic variance p L 
ARE * 

a 

asymptotic variance p D 

These values are calculated for each of the parameter 
configurations employed in Table 1 for the normal mixtures. 
As in Table 1, the asymptotic results indicate that the MDE 
compares more favorably with the MLE when p=.5 while its 
relative performance is not as good for p=.25 or p-.75. 


Table 3 - Asymptotic Relative Efficiencies 





Overlap ■ 

.10 

Overlap ■ .03 


p 

Ratio 
of Scale 
Factors(a) 


Asymptotic 

Variance 

ARE 

Asymptotic 

Variance 

ARE 



MDE 

13.60 

(7.80)* 

.42 

(.55) 

.471 

(1.09) 

.69 

(.49) 

.25 

1 

MLE 

5.67 

r . .<4^26) _ 


.323 

(.539) 




MDE 

A. 54 
(3.86) 

.65 

(.83) 

.398 

(.420) 

.89 

(.91) 

.50 

1 

MLE 

2.95 

(3.21) 


.355 

(.382) 




MDE 

18.77 

(5.30) 

.32 

(.42) 

.511 

(.956) 

.65 

(.51) 

.25 

/2 

MLE 

5.96 

(2.25) 


.330 

(.489) 




MDE 

3.49 

(2.79) 

.68 

(.86) 

.395 

(.441) 

.89 

(.94) 

.50 

ri 

MLE 

2.39 

(2.41) 


.353 

(.416) 




MDE 

5.51 

(8.36) 

.58 

(.58) 

.420 

(1.08) 

.73 

(.44) 

.75 

si 

MLE 

3.18 

(4.87) 


.305 

(.470) 



♦Associated Monte Carlo results from Table 1 are given in parentheses. 
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5. Concluding Remarks 

We believe that the results of this paper provide 
further evidence that the use of the MDE should be 
considered in crop proportion estimation procedures 
developed by NASA. Our results# again, and more conclusively 
than before, indicate that the MDE is indeed more robust 
than the MLE in the sense that it is less sensitive to 
symmetric departures from the underlying assumption of 
normality of component distributions. 

Woodward et. al. (1983) have investigated basing the MD 
estimation procedure on a mixture of Weibull components in 
order to allow for possible asymmetry in the component 
distributions. Their results indicate that this approach 
provides a viable alternative to the normal-based procedures 
discussed here. Research is also proceeding on the case of 
multiple (>2) components in the mixture. 

The results of Section 4 indicate that the MDE does not 
perform as well as would be hoped when the data actually do 
arise from a mixture-of-normals model. We are currently 
examining the use of the Hellinger metric in this regard due 
the results of Beran(1977) concerning the full asymptotic 
relative efficiency of minimum Bellinger distance 
estimators. 
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MINIMUM BELLINGER DISTANCE ESTIMATION 
OP MIXTURE MODEL PARAMETERS 

Wayne A. Woodward and Paul W. Eslinger 
1. Introduction 

Recent reports by Woodward et. al. (1982,1983) have 
considered minimum distance estimation (MCVMDE) , based on 
Cramer-von Mises distance, as an alternative to maximum 
1 ikelihood (ML) for estimating the parameters of the 
mixture-of-normals model. Their results indicate that the 
MCVMDE is more robust to departures from the assumption of 
normal components than is maximum likelihood. In particular, 
they have shown that if mixture-of-normal based MCVMD and ML 
procedures are used to estimate the parameters of a mixture 
of symmetric (but non-normal) distributions such as double 
exponential, t(4), or t(2), then the MCVMDE produces 
superior proportion estimates. However, their results also 
show that when the component distributions actually are 
normal, the MLE is superior. 

Intuitively, robust procedures are those which are 
insensitive to small deviations from the assumptions. 
Typically, robust procedures obtain this robustness at the 
expense of not being optimal at the true model. In fact, 
Bickel(1978) describes robustness as "paying a price in 


2 


terms of efficiency at the (true) model in terms of 
reasonably good maximum M.S.E. over the neighborhood." The 
behavior of the MCVMDE described above is a good example of 
this trade-off. However, Beran(1977) has suggested the use 
of the minimum Hellinger distance (MHD) estimator which has 
certain robustness properties and is asymptotically 
efficient at the true model. Its applicability to aerospace 
remote sensing is of interest since it has the potential of 
providing robust proportion estimates under deviations from 
normality while maintaining performance comparable to the 
MLE when the underlying components actually are normal. In 
this report we will briefly examine the use of the MHDE for 
estimating the parameters of the mixture-of-normals model. 


2. The Minimum Hellinger Distance Estimator 

Let X, f X 2 , ..., X denote a random sample from some 
unknown distribution and let Y±, Y 2 , ...,Y n denote the 
corresponding order statistics. Further, Let = {F e :0e0} 
be a family of distributions, called the projection family 
or projection model, depending on the (possibly vector 
valued) parameter 6. A minimum distance estimate of 6 is a 

A 

value 6 which minimizes the distance between the data 
distribution (whose model is unknown) and the projection 
model. In particular, the MCVMDE minimizes the Cramer-von 
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Mises distance between the empirical distribution function 
and F Q . For more discussion, the reader is referred to 
Woodward, et.al. (1982) . 

Bellinger distance between two absolutely continuous 

k k 

distributions is defined to be Ilf -g a j | where f and g are 

2 

the corresponding densities and li.ll denotes the usual L 
norm, i.e. 

| |f 1/2 _gl/2| ! , [/( f 1/? -g 1/2 ) 2 dx] 1/2 (2.1) 

where integration is with respect to Lebesgue measure on the 
real line. Let J denote the set of all absolutely continuous 
probability functions with respect to Lebesgue measure on 
the real line, and for our purposes, let J 0 = {F e :6e0}, the 

projection family, be a parametrized subset of ?. The MHD 

A 

estimator e H of e is defined as a value of 6 which minimizes 

where g is a suitable nonparametric density 

on n 

estimator. It should be noted that minimizing ||f^-g J *|| is 

* o n 

equivalent to maximizing 

Jfj /2 gj /2 dx (2.2) 

and we will utilise this form for computational convenience. 

Beran(1977) and- Stather (1982) have provided theoretical 
results establishing the consistency, asymptotic normality, 
asymptotic full efficiency, and robustness of the MHDE. 
However, their results only briefly discuss the 

computational aspects of implementing the MHDE and provide 
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only limited empirical evidence concerning its robustness. 
In this report we investigate the usefulness of the MHDE for 
estimating the parameters of the mixture-of-normals model. 


In the mixture-of-normals setting, f Q becomes 


f A (x) * — 

6 o. 




X-Vi 2 


X, x "^2, 2 


U- P) 
/2t r a n 




(2.3) 


where g=(y a . ,w«,a,,p) ' .In the next section we present the 
results of a simulation study in which the MHDE is 
calculated using the projection model in (2.3). In these 
calculations, we have employed Newton's method to maximize 
(2.2), which produces the iterative algorithm 


e^ m+1) = a^ in) - [J 


3 2 f y 2 


e 


30‘ 


1/2 dxr 1 / , 


3f 


1/2 

e 


n 


30 


;i/2, 

; n 


g~' ^dx 


(2.4) 


A i v 

where 6^ denotes the estimate of 0 obtained on the mth step, 
and ©^denotes the starting value, (y.£ 0 * ,a i°^ ,a 2°^ ) * 

If any step produces estimates of or o 2 which are less than 
zero, then we use a scaled step "half-way" to zero. 

In the implementation the density estimator used is 


;i/2 

'n 


9~' ~ (x) = 


ncs„ . , 
n n i=l 


n x-X. 

I w( -) 

L r> a 


C S 

n n 


based on the Epanechnikov kernel w(x)*.75(l-x ) for | x |^_ 1 , 

A (0) A (0) A (0) 

with the scale statistic s n set to cr^ when p >.5 and a 2 

when p v 1 <.5. For a discussion of density estimators see Tapia 

and Thompson (1978) . The value for c n is given by the 

- . 271 

expression c n =2.16n ’ . These values of c n are optimal for 

use with a normal projection model and are used here for 
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convenience. Although further investigation into c n values 
which are optimal for use with the projection model of (2.3) 
is needed, we believe that the c n values utilized are 
sufficient for the purpose here. When the projection model 
in (2.3) is used, it follows that J-gy-g^dx in (2.4) is a 5x1 


,a 2 ffi 


vector while / Lg^fdx in (2.4) is a 5x5 matrix, the elements 

36 2 n 

of which are integrals to be evaluated at each step of the 

iterative procedure. In the Appendix we show the partial 

9f fi s 2 f% 

derivatives involved in the calculation of — and e 


36 


We 


30 ' 


have chosen to evaluate the numerical integrals using the 

trapezoidal rule over a grid of 100 steps equally spaced 

between Y -c s and Y +c s„, i.e. the range of support of 
I nn nnn' 


'n 


3. Simulation Results 

i 

In this section we report the results of simulations 
designed to provide empirical evidence concerning the 
effectiveness of the MHDE using a mixture-of-normals 
projection model when the component distributions in the 
simulated samples are normal and when they are non-normal. 
In addition, we have made our comparisons under two levels 
of separation between the component distributions. 

In these simulations, we have used parameter 
configurations previously considered by Woodward, et. al. 
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(1983). In particular, we use mixing proportions .25, .50, 
and .75 and "overlaps" as defined by Woodward, et. al. (1982) 
of .03 and .10. Again, as in the previous work, we consider 
cases in which the ratio of the standard deviations of 
component 1 to component 2 is 1 and when it is /2. In these 
simulations we have simulated mixtures with normal and t(4) 
components. For each set of configurations, 500 samples of 
size n=100 were generated from the corresponding mixture 
distribution. Simulations were performed on the CDC 760 
computer. Starting values were obtained as discussed by 
Woodward, et. al. (1982) with the exception that starting 
values for the component standard deviations, and a 2 , 
utilized in this study are smaller than those used in the 
previous reports (Woodward, et. al. (1982, 1983 )) by a factor of 
approximately 1.2. For each sample simulated, the MCVMDE, 
MHDE, and MLE for all 5 parameters were obtained. However, 
only the results for the estimation of p are tabled since 
the mixing proportion is the parameter of interest. 

In Table 1 we present the results for simulated 
mixtures of normal components, while in Table 2 we show the 
results for simulated mixtures of t(4) components. 
Simulation based estimates of the bias and MSE associated 

with the various estimators are given by 

n 


Bias = 


MSE = 



1 


n 


s - 

I (P ± -P) 
i=l 1 

n s 

l (Pi-P) 



i=l 


and 
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Table 2 - Simulation Results for Mixtures of t(4) Components 

Sample size = 100 
Number of Replications = 500 
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0 


where n g denotes the number of samples and denotes an 
estimate of p for the ith sample. As in the earlier reports, 

■A 

nMSE is given in the table where n is the size of each 
individual sample (in our case 100). We provide the ratios 


and 


E CVM 


MSE(MLE) 

— * — — 

MSE(MCVMDE) 

A 


E = MSE(MLE) 

H MSE(MHDE) 
as empirical measures 


of the relative 


MCVMDE and MHDE respectively with the 


efficiencies of the 
MLE. An approximate 


standard error of a tabled nMSE is ( .0632) (nMSE) . 

The results in Tables 1 and 2 illustrate the 
characteristics of the MHDE shown theoretically by 
Beran(1977) and Stather (1982) . In particular, for the 
simulated mixtures of normal components in Table 1, the MSEs 
for the MHDE were comparable (in most instances) to those 
for the MLE and smaller than those for the MCVMDE. This 

A 

behavior can also be seen by noting that E„ is close to 1 

n 

A 

for most configurations while E CVM is consistently less than 
1. However, in Table 2, for simulated mixtures of t(4) 

A 

components, E was greater than 1 in all but one case. In 
addition, the robustness shown by the MHDE was in most cases 
comparable to that for the MCVMDE as evidenced by similar 

A A 

values of E H and E^ym* As noted in the previous reports, see 
Woodward, et. al. (1982,1983) , the starting value routine 
provided good estimates, which in fact were competitive with 
those given by ML, MCVM, and MHD techniques. 


10 


A few further comments are in order. First, although 
the computational aspects of the MHDE are complex, we found 
the computer time required for the MHDE to be similar to 
that for the other two estimators. The Newton-Raphson 
procedure used to calculate the MHDE is quadratically 
convergent. This usually resulted in convergence within 10 
steps for the MHDE. The MCVMDE also usually converged within 
lOi iterations while the MLE required more, especially for 
the .10 overlap, in which case more than 50 steps were often 
required. However, the MLE is computationally much simpler 
at each step. For a discussion of the computational 
procedures used to calculate the MLE and MCVMDE, see 
Woodward et. al. (1982,1983) . 

The number in parentheses after the MHDE results in the 
table is the number of times (out of 500) that the MHDE 
actually converged. When convergence was not obtained for 
any of the estimators, the estimate was taken to be the 
starting value. For the MCVMDE and MLE, convergence was 
almost always obtained. However, it can be seen that the 
failure of the MHDE to converge was a common occurrence. Of 
course, the results in the tables for the MHD must be viewed 
accordingly, i.e. approximately 20% of the "MHD" estimates 
used in the bias and MSE calculations are actually starting 
values. In some instances, this may improve the performance 
of the MHDE. 

A related observation is that the MHDE seems to be 


u 


quite sensitive to starting values. For example, in the 
tables, we see that the poorest results for the MHDE are 
obtained when p».75 and the ratio of standard deviations 
between components is /2< it should be noted that this is 
also the situation in which the starting values are the 
poorest. While the other two estimators do not seem to be 
overly affected by these poor starts, the MHD is quite 
sensitive. As noted earlier, the starting values for a ^ and 
a 2 used here are smaller than the intuitively appealing ones 
proposed earlier by Woodward, et. al., (1982,1983). Although 
we do not understand why, the use of these smaller starting 
values improves the performance of the MHDE (and has very 
little effect on the MLE and MCVMDE) . 

In related investigations of the MHDE, we have examined 
its performance on the estimation of the location and scale 
parameters of a univariate normal projection model. In this 
setting we have also seen an extreme sensitivity to starting 
values. In Table 3a we display an array of starting values 
for y and a of a univariate normal projection it:odel. Samples 
of size n=40 were simulated from a normal distribution with 
y=0 and a«l. In Table 3b we provide an associated array 
displaying the number of times out of 1000 such samples that 
the iterative routines for the MHDE converged when using the 
corresponding starting values in the array of Table 3a. The 
sensitivity of the MHDE to poor starting values is very 
evident. It should be noted that using the "good" starting 
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Table 3 - Effect of Starting Value on 
MHD Estimators 

of \t and a from 1000 Simulated 
N(0,1) Samples of Size n ■ 40 




(a) 

Starting Values 


j(- l.%> 

(-1,^) 

(-1,1) 

(-1,^) 

<-l,2)\ 


<-% 

(-^,D 

(-^,/2) 

(-*5,2) 


(0,/5) 

(0,1) 

(0,/2) 

(0,2) 

(H,h) 

(**,/£> 

0s,i) 

0i,/2) 

(>S,2) 

yd.**) 

<l,/s> 

(i,D 

(l,/2) 

(1,2) 


(b) 

Number of Times (out of 1000) that 
MHDE Converged Using Starting 
Values from Table 3a 

\ 

/ 62 

43 

59 

458 

284 ' 

210 

420 

867 

866 

233 

834 

993 

999 

876 

179 

196 

415 

843 

866 

224 

\71 

49 

60 

463 

265 , 
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values y median and c ^»median{ |X,j-p ^ j }/. 6745, obtained 
from the data for each sample , resulted in convergence of the 

MHDE for all 1000 of the samples. In contrast to the results 

of Table 3b, the MCVMDE converged 1000 times out of 1000 for 

each set of starting values in Table 3a r while of course, in 

— _ i n __ 2 

this situation the ML estimators X and S z * — £(X 4 -X) exist in 

n i-l 1 

closed form. 


4. Concluding Remarks 

In this report we have briefly considered the use of 
the MHDE for estimating the parameters of the mixture of 
normals model. The MHDE was of interest originally due to 
its theoretical robustness and asymptotic full efficiency. 
Our empirical results indicate that these properties do hold 
in the mixture setting, at least to some degree. We have 
shown that the MHDE requires computation times which are 
similar to those for the other techniques although it is 
more difficult to calculate. Further research is in progress 
concerning the use of density estimators other than the 
Epanechnikov kernel density estimator. Preliminary results 
indicate that MHD estimates based upon the histogram density 
estimator (Tapia and Thompson (1978) ) require substantially 
less computer time than those based on the Epanechnikov 
kernel, and they have only slightly higher MSEs. 
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The major problem concerning the use of the MHDE 
appears to be with its extreme sensitivity to starting 
values. It is our opinion that, although, these convergence 
problems could be somewhat alleviated with further 
"fine-tuning" of the iterative algorithm, the implementation 
of the MHDE into segment level proportion estimation 
procedures would be difficult. 
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PROPORTION ESTIMATION IN MIXTURES 
OF ASYMMETRIC DISTRIBUTIONS 


Wayne A. Woodward, Richard F. Gunst, 

Hildegard Lindsey, and H. L. Gray 

Center for Applied Mathematical and Statistical Research 
Southern Methodist University 

1. Introduction 

A standard approach to the estimation of crop 
proportions in agricultural remote sensing has been to 
estimate the proportions Pi'P 2 "'P m * n the mixture density 

f(x) = p 1 f 1 (x) + 'p 2 f 2 (x) + ... + p m f m (x) (1.1) 

where m is the number of components (crops) in the mixture 
and f i (x) is the density associated with component i. The 
usual procedure for estimating the parameters in the mixture 
model of (1.1) has been to: 

(a) assume that the component distributions are normal 

(b) use maximum likelihood estimation. 

The variable X has usually been taken to be the 
reflected energy in the four LANDSAT bands or some linear 
combination of these such as greenness or brightness. Recent 
efforts have focused On the use of certain derived features 

from growth models such as g„„„ and t as variables in the 

mixture model. Studies have indicated that there is often a 


substantial asymmetry in the distributions of these features 
for a given crop. Woodward et. al. (1982) have shown that 
asymmetry in the component distributions can cause a 
substantial bias in the proportion estimators when the 
mixture of normals model is assumed. As an example, in 
Figure 1 we display the mixture density associated with the 
mixture of two distributions. Examination of the figure 
reveals that if the component distributions are assumed to 
be symmetric, then we must conclude that P 1 <P 2 and that the 
component to the right has larger variance. Actually, in 
this mixture p 1 =p 2 and the distribution to the left is a 
9 ) while the component to the right is a "shifted" x 2 (9)> 
i.e. its left truncation point is at x=10 instead of x*0. It 
can be seen that a bias will be introduced in estimating 
mixing proportions in this* mixture if the component 

distributions are assummed to be symmetric, which of course 
is the case when the components are assumed to be normal. 

In this paper we will discuss techniques for estimating 
the crop proportions in the presence of asymmetric component 
distributions. In particular the estimation procedures we 
will propose assume that the underlying component 

distributions belong to some family of distributions whose 
members can be either symmetric or skewed depending on 
parameter configurations. At the present time, the Weibull 
distribution is being examined concerning its usefulness in 
this area. The effectiveness of this technique will be 
examined through simulations. 





2, The Weibull Distribution 


The Weibull distribution is named after the Swedish 
physicist Waloddi Weibull who used it to represent the 
distribution of the breaking strength of materials 
(Weibull (1939) ) . The distribution has been widely used in 
recent years in the fields of reliability and quality 
control. Its popularity is largely due to the flexibility 
which it introduces into the model due to the fact that it 
can be used to describe distributions which are symmetric or 
skewed in either direction. For these reasons we have chosen 
to investigate its applicability to estimation in mixtures 
of asymmetric components* The three-parameter Weibull 

density can be expressed as « 

x-a Y 

Y X-Ct Y_1 •(-jp) 

f (x) » ^ ( -g - ■) e 0 x a (2.1) 

3 , Y > 0 


We will use the notation X^W(a r b,c) to indicate that the 
random variable X has a three-parameter Weibull distribution 
with parameters ot=a, g»b, and y=c. The parameter a locates 
the left truncation point and 0 serves as a scale parameter 

while Y determines the shape of the distribution. In 
Figure 2 we shew Weibull densities for a fixed a and 0 and a 
range of values for Y. From the figure it is clear that the 
shape can vary dramatically as Y changes. In Figure 3 the 
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fact that the Weibull density can be skewed to the left as 

* 

well as to the right is more clearly demonstrated. For 

Y "3. 60232 aprroximately, the standardized skewness parameter 

M, 

0 . ■ where p, is the ith central moment, is zero 

X yj / l i 

indicating symmetry. If Y <3. 60232 then the Weibull is skewed 
to the right, while if y> 3. 60232 it is skewed to the left. 
The Weibull distribution is unimodal, and if Y>1 the mode 
occurs at 

X » a + 6 ^ . 

m v y 

Otherwise, when 0 <y< 1* the mode occurs at x m =a. 

Dubey(1967) has studied the Weibull distribution when 
Y*3. 60232 and has concluded that it is very similar to the 
normal. In particular, Oubey has shown that 

sup|F z (v) - F y (v) I (2.3) 

-3<v<3 

where F z denotes the cumulative distribution function of the 

random variable Z^NlO,!) and Y is the standardized variate 

2 

Y»(X-u)/cr where p and a are the mean and variance of the 
Weibull variate X. 

It should be noted that the Weibull distribution is 
often given in the literature in two parameter form in which 
a is assumed to be known (and usually 0) . However, unless 
otherwise specified, reference to the Weibull distribution 
in this report, we will be to the three-parameter form 
specified by (2.1). 

The cumulative distribution function corresponding to 
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the three-parameter Weibull is given by the closed form 
expression 

-<W 

• 1-e 0 


p x <x) 


(2.4) 


while the noncentral moments are given by 

u' - l (jl)a r “ k 0 k r(£ + 1) (2.5) 

r Jc-0 * Y 

From (2.5) it can be seen that 

V - a + Br(i + 1) 

a 2 « 3 2 tr’(£ + i) - r 2 (i + i)} . (2.6) 


The first three moments of the Weibull distribution 
determine the values of a, S, and Y. The method of moment 
estimators can be obtained using these relationships, but 
unfortunately the estimators do not exist in a closed form. 
The log-likelihood function for a random sample of n 

observations from the VJeibull distribution is 

n n 

Jln(L) * n?,ny -nyJln8 + (y-1) J £n(x.-ct) - — l (x.-a) Y (2.7) 

i-1 1 (3 Y i-l 1 


Differentiating ln(L) yields the following 
equations 


n 


■!y-l) l (x.-o) -1 + l 

i-1 s 1 ' 1 - 1 

8 - [I? ( V «I T J 1/T 

1-1 

n x 1 -ot x,-a y -1 
Y * { l [An <— =w — 
i-1 0 


likelihood 


( 2 . 8 ) 


(2.9) 


x.-a y , -1 
)][Mj-) - 1]> 


( 2 . 10 ) 


Let a, B , and Y denote the estimators obtained from the 

* * 

simultaneous solution of equations (2.8) to (2.10). If 0 <a<Y^, 
where Y^ denotes the ith order statistic, these estimators 
are the maximum likelihood (ML) estimators for the three 
Weibull parameters. However, due to the restriction x>a in 

(2.1), if a^ , then the MLE of a is taken to be Y x and 

B and a are estimated from (2.9) and (2.10). As in the case 

« 

of method of moment estimators, the ML estimators do not 

have a closed form expression. For a general review of the 

literature on Weibull parameter estimation see Johnson and 
Kotz (1970) . 


3. Mixtures of Weibull Distributions 


In order to examine the feasibility of using the 
Weibull as a model for the component distributions in the 
mixture model of (1.1), we will investigate the estimation 
of the parameters in the mixture of two Weibull 
distributions. This mixture density is given in (3.1) 


f (x) 




x-a 


Vi 


x-a, 


Y ,- 1 -( 


x-a- Y 


T 


-> 


P TT-t-S-*-) * e K1 * <1-P> r^T -50 e ,, 2 ,, 
B - b 2 b 2 (3.1) 

where the 7 parameters p, a. 


V s i 


‘ 1 ' B 1 ’ Y 1 ' °2 ' S 2 ' and Y 2 ace 

assumed to be unknown. 

Previous research in this area includes that of 
Kao(1959), who proposed a graphical procedure for estimating 
the parameters in (3.1) when one of the location parameters 
is assumed to be known and equal to zero. The estimation of 
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the 6 remaining parameters is accomplished using a graphical 
procedure whose applicability to our problem seems to be 
limited although some of his estimation rules could be 
automated. Rider (1961) and Falls (1970) propose estimating 
the parameters of a mixture of two-parameter Weibulls using 
the method of moments. Falls' procedure involves estimating 
the mixing proportion p using a graphical procedure similar 
to that of Kao. 

Maximum likelihood estimation Of the parameters of 
(3.1) has been discussed by Looney and Bargmann(1982) . The 
likelihood equations obtained by differentiating the 
log-likelihood function ln(L) 


Jln(L) 


n 

* l Un[pf,(x .) + 
1*1 1 1 


(l-p)f 2 (x.)]} 


with respect to each of the 7 parameters yields the 


likelihood equations 
n 

(Yj- 1 ) I f(j|x.) 

3 i -1 1 1 1 


-1 


n 


Y-rl 


) ‘ I £( j 1 ^) (x^cij) 3 - 0 ,jw ?,2 


n 


3 i’i-i 

n 1/Yj 


(3.2/ 
3) 


3, -US (x.-a.) -»f (3 |x.) 1 / l f ( j jx . ) } j-lf 2 (3. 

i -1 x _ a x i “J n 

vM[ l ((-^^) Y ^-l)Jin(^0-)]/ l f(j|x )} -1 j=l, 2 (3. 

J i-1 Pj p j i=l 1 


4) 


p»il£(l|x) (3.! 
n i-l x 

where f(i|x) = Pff^xJ/fU) with fj_(x) denoting the ith 
component density and f(x) the mixture density. Solving this 
set of equations for the maximum likelihood estimators is 
difficult due largely to equations (3.2) which are not in 
fixed point form. Looney and Bargmann(1982) suggested' a 


procedure in which the shape parameters Yj. an< 3 Y 2 are fixed 
independently at each of the values 


(-11112 

{ 7 ' ?' T' ?• 7 ' 



4, 


A A 


5) 


and# for each of the (y 1 /Y 2 ) pairs, “preliminary" maximum 
likelihood estimates of the remaining 5 parameters are 

A A 

found. A search procedure results in selecting the (Y 1 /Y 2 ) 

A A 

pair for which ln(L) is maximized. With Y^ and Y 2 fixed at 
these values, maximum likelihood estimation for the 
remaining 5 parameters is then carried through to 
convergence. The Looney and Bargmann procedure for solving 
the system of equations (3.2) - (3.5) seems overly 
restrictive with respect to the selection of possible values 
of the shape parameter, while expansion of the search 
procedure to allow for more shape parameter values would 
probably be prohibitive because of time constraints. 
However, solution of these likelihood equations directly 
appears to us to be quite intractable. For these reasons, we 
have investigated the use of minimum distance (MD) 
estimation, first introduced by Wolfowitz (1957) , for 
estimating the 7 parameters in the mixture of Weibulls model 
given in (3.1). Woodward et. al.(1982) have recently studied 
the use of MD estimation in the mixture of normals model. 
These authors showed that MD estimation was easy to 
implement in that setting, and that MD estimators showed to 
be superior to ML estimators under departures from component 
normality. Since our use of Weibull components is due to the 
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flexibility which it introduces into the model rather than 
underlying theoretical justifications, we definitely need an 
estimation procedure which is robust to departures from 
assumptions. 

The minimum distance estimator of the parameter 6 
(possibly vector valued) is defined to be that value of 9 
which minimizes the distance between H 0 and F n where 
denotes a family of distributions depending on 6 


and denotes the empirical distribution function, i.e. 

F ( x) =k/n where k is the number of observations less than or 
n 

equal to x. The family of distributions h is referred to as 
the projection model, where in this case 
9*(p, c^, 0 X , y L , a 2 , 0 2 , y 2 ) , and H 0 (x) is the distribution 
function associated with a mixture of two Weibull components 
given by 


x-a. Yj. 

H 0 (x) = p[l-e P 1 ] + (1-p) [1-e 


-(- 


x-a 


BT" } 


1 


(3.6) 


Note that in contrast to the situation in which the 
projection model is taken to be the mixture of two normals, 
H 0 ( x) in (3.6) has a closed form expression. The choice of 
distance function to be used to measure the distance between 
two distributions is a topic of current interest in the 

field of MD estimation. Woodward et. al. (1982) used the 

^ 2 
Cramer-von Mises distance, W , given by 

00 

W 2 = /[G 1 (x)-G 2 (x) ] 2 dG 2 (x) (3.7) 

aOO 

where and G 2 are two distribution functions, and we have 
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Chosen to use this distance measure in the current study. 

The distance between a distribution function Hg and the 
empirical distribution function F n , which is needed for 
calculation of the HO estimator, is given by the simplified 
expression 

n 

W n - lfe + I < H 9 ( V - ' t3 - 8) 

i=l 

where denotes the ith order statistic. Since Hq(X) exists in 
closed form, the MDE in this case is easily obtained by 
using nonlinear least squares techniques to minimize (3.8). 

We have performed this minimization with IMSL subroutine 
ZXSSQ which uses Marquardt's(1963) procedure. 

4. Simulation Results 

In Section 3 we discussed the problem of estimation in 
the mixture of Weibulls model. From that discussion it 
appears that the minimum distance techniques are preferable 
for estimating the parameters in a mixture of three 
parameter Weibulls, especially in terms of computational 
convenience. In this section we will discuss the results I 
an initial computer simulation which was designed for use in 
evaluating the numerical capabilities of this method. All 
computations were performed on the CDC 6600 at Southern 
Methodist University. In this section we will evaluate the 
performance of the MD estimation procedures discussed. Since 
the usual procedure is to assume that the components are 
normal, we will compare the Weibull based MDEs with the 


normal based 


procedures. We have generated samples from 

2 

mixtures of normal components and mixtures of X (9) 
components. Obviously, we would expect the normal based 
procedures to perform better than Weibull based procedures 
when the mixture really is a mixture of normal components. 
However, if the Weibull techniques are to be useful, then 
they must give reasonable results in this situation since 
the normal assumption does appear to be a reasonable 
assumption in some cases. Since the Weibull with y«3.6 is 
very nearly normal, there is reason to believe that Weibull 
procedures will perform well in this situation. We have not 
simulated samples from mixtures of Weibull distributions, 
but we plan to consider this in the future. Of course, as 
mentioned in the previous section, we are most interested in 
the performance of the Weibull based procedures when the 
underlying components from which we sample are not 

necessarily Weibulls, but are realistic representatives of 
the types of component distributions we see in practice. 

Our simulation results are based on 200 samples of size 

2 

n=200 from mixtures of normal and of x (9) components. In 
each mixture, the variance associated with the two 
components are equal. In fact, the two component 
distributions differ from each other only by a location 
shift. We have simulated from mixtures having mixing 
proportions of .25, .50, and .75, and with varying degrees 
of separation between the two component distributions. 
Overlap as defined by Woodward et.al.(1982) is a 


quantification of tjiis separation. It is defined as the the 
probability of misclassif ication using the rule: 

Classify an observation x as: 
population 1 if x < 
population 2 if x > x c 

where without loss of generality, population 1 is assumed to 
be centered to the left of population, and vrhere x c is the 
unique point between y-j. and y 2 such that 

P f 1 (x c ) * (1-P)f 2 (x c ). 

We have based our current study on "overlaps” of .03 and 

.10. In Figure 4 we display the mixture densities associated 

with normal components. For each mixture, the scaled 

components pf 1 (x) and (l-p)f 2 (x) are also shown. Note that 

the densities for p=.75 are not displayed here. Since o^a 2 , 

it follows that f p (x) *f*’*"^y 1 +y 2 -x) where f p (x) denotes the 

mixture density associated with a mixing proportion of h. 

Thus the shapes of the densities at p=.75 can be inferred 

from those at p=.25. Likewise, parameter estimation for 

p*.75 is not included in the results of the simulations for 

the mixtures of normals. In Figure 5 we display the mixture 

2 

densities associated with the mixtures of x (9) components. 

2 

Note that although we refer to a mixture of x (9) 
























distributions here, they are actually "shifted" chi-squares, 
i.e. the left truncation points are different from zero. 

For each of the simulated samples, three sets of 
parameter estimates were obtained: 

(1) ML estimates based on mixture of normals model (MLEN) 

(2) MD estimates based on mixture of normals model (MDEN) 

(3) MD estimates based on mixture of Weibulls model (MDEW) 

Although the MLEN and MDEN provide estimates of all 5 of the 
parameters of the mixture of normals model, and the MDEW 
produces estimates for all 7 parameters in the mixture of 
Weibulls model, only the results for the estimation of p 
will be shown. The mixing proportion is the parameter of 
primary interest, ana when dealing with the "wrong-model" 
situations, the remaining parameter estimates often do not 
have a meaningful interpretation. For purposes of aiding in 
the discussions which follow, we will call a component model 
from which we actually simulated, a "simulation component 
model", while a component model which is assumed under a 
particular estimation procedure will be called an 
"estimation component model". Thus, a "wrong-model" 
situation is one in which the simulation component models 
are not the same as the estimation component models. 

In the "correct-model" situations, i.e. using the MLEN 
or MDEN to estimate the parameters of a simulated mixture of 
normal components, the true parameter values are used as 
starting values for the iterative estimation procedures. In 
all of the other cases, there is not a "true" set of 


parameters. For starting values# we have used the "true" 
mixing proportion# and then estimated the parameters of each 
component separately using a method of moments procedure. 
Consider a situation in which the estimation components are 
normal. We obtain starting values for each component by 
equating the first and second moments of the corresponding 
simulation and estimation components and using these to 
obtain y^and a ^ for the normal estimation component. When 
the estimation components are Weibull, we have taken the 
approach of setting the starting value for Y at Y *3.6 for 
each component. Then the first two moments of the 
corresponding simulation and estimation components are 
equated to yield starting value estimates for the other two 
parameters. We believe that this provides a "neutral" start. 
If the final estimates reflect the finding of substantial 
skewness for one or both of the component Weibulls, this 
will be because of the data and not because of "skewed" 
starting values. 

The normal component models were generated with *7.5# 

2 2 

Oi*cr 2 »l, and u 2 positioned so that the desired overlap is 
obtained. As mentioned previously# both components in the 
chi-square mixtures were "shifted" chi-squares. In our 
simulations# the left truncation point for population 1 was 
always taken to be 7.5# and for population 2 it was located 
so that the desired overlap was obtained. In the MLEN and 
MDEN procedures# the natural constraints a^>0 #a 2 >0# and 
0<p£l were imposed. Similarly# for the MDEW, the natural 
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constraints g^o, y^X), 8 2 >0, Y 2 > 0 , an <* 0<jp<l were imposed 
along with the constraints ct^X) and ot 2 l° which are 
reasonable constraints on the left-truncation point which 
would be imposed due to physical considerations, etc. 

In Table 1 we display the results of the simulations. 
For a given simulation model and estimation procedure, we 

A 

will obtain an estimate p of p, defined by 



n 


jl 1 ; 

n s U : 


where p^ is the estimate of p for the ith sample, and n g is 
the number of samples. Then based upon the simulations, 
estimates of the bias and MSE are given by: 
n e 

bias » -r- l (p*-p) » P - P 

8 f-.l 

n s ^ 

MSE - l (p.-p) 2 . 

5 i=l 

Upon viewing the results, it can be se 'n that the HDEW 
was competitive when the component models were actually 
normally distributed, and it produced the best overall 
results for the chi-square mixtures. Of particular interest 
is the chi-square mixture where p*.5 and overlap=.10. This 
is the mixture displayed in Figure 5c and also in Figure 1 
(except for location shift) . When symmetric components are 
assumed (as with the MLEN and MDEN) , a bias does occur in 
the estimation of p as discussed in Section 1. This behavior 
has been noted previously by Woodward, et.al. (1982) . We see 
from the table that the MDEW performs substantially better 
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Tabic 1 - Simulation Results 

Comparing Normal Based with 
Weibull Based Estimation Procedures 

Sample size * 200 
Number of repititions - 200 

Mixture of Normals 



A 

0 

Overlap ■ 
Bias 

.10 

MSE 

A 

0 

Overlap * 
Bias 

.03 

MSE 

MLEN 

.27 

.02 

.022 

.25 

.00 

.022 

MDEN 

.37 

.12 

.074 

.26 

.01 

.004 

MDEW 

.34 

.09 

.044 

.30 

.05 

.011 

MLEN 

.50 

o 

o 

• 

.014 

.50 

.00 

.002 

MDEN 

.49 

-.01 

.023 

.47 

i 

• 

O 

u 

.002 

MDEW 

.48 

-.02 

.019 

.51 

.01 

<r 

o 

o 

. 


Mixture of C9) 


* 


A 

0 

Overlap - 
Bias 

.10 

MSE 

A 

0 

Overlap • 
Bias 

.03 

MSE 


MLEN 

.24 

-.01 

.061 

.18 

-.07 

.006 

p - .25 

MDEN 

.41 

.16 

.098 

.17 

-.08 

.008 


MDEW 

.50 

.25 

.122 

.29 

-.04 

.007 


MLEN 

.27 

-.23 

.064 

.45 

-.05 

.011 

p ■ .50 

MDEN 

.26 

-.24 

.061 

.41 

-.09 

.010 


MDEW 

.42 

-.08 

.024 

.50 

.00 

.004 


MLEN 

.50 

— . 25 

.070 

.65 

-.10 

.013 

p - .75 

MDEN 

.48 

-.27 

.085 

.64 

-.11 

.016 

. 

MDEW 

.62 

-.13 

.032 

.71 

.04 

.005 
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than either of these normal based procedures on the basis of 
both bias and MSE. In Figure 6 we display histograms of the 
200 estimates of p obtained from the three estimation 
procedures for the chi-sguare mixture shown in Figure 5c. It 
can be seen there, that the normal based procedures 
consistently estimated p to be substantially less than .5 
while the estimates based on Weibull components were in 
general closer to the true vaue p».5. 

The one case in which the Weibull based estimates were 
not best, was when pa. 25 with overlaps. 10. This mixture is 
displayed in Figure 5a where it is obvious that estimation 
should be difficult since there is no distinct contribution 
due to component 1 in the mixture. Indeed, all procedures 
yield poor estimates as mefl, surfed by the high MSEs. In Figure 
7, we display histograms of the p values obtained from the 
three estimation procedures for this set of parameter 
configurations. There it can be seen that the Weibull 
procedure certainly gave the poorest results, with estimates 
being spread nearly uniformly between 0 and 1. However, the 
normal based procedures also had difficulty as is reflected 
in the histograms. In fact, there appears to be a tendency 
for the values to be very low (approximately .10). 
However, p is very close to .25 for the MLEN since several 

/s 

of the Pi values were spread out uniformly between 0 and 1, 
which increased the estimate of p to near .25. However, the 
large MSE shown in the table for this case reflects this 
lack of accuracy. 


of Estimates 



mhiiniMn 





5. Concluding Remarks 


Results in this report and in the report by Woodward, 
et.al.(1982) indicate that the normal based procedures 
perform poorly in the presence of a mixture of asymmetric 
distributions. In this paper we have suggested the mixture 
of Weibulls model as an alternative to the mixture of 
normals model in this situation. Results indicate that 
minimum distance estimation of the parameters of a mixture 
of Weibulls is a viable alternative to the normal-based 
techniques currently in use. 

Before this procedure could be recommended and 
implemented, further research is needed, tor example, the 
problem of how to obtain starting values for the parameters 
of mixtures of possibly asymmetric components has not been 
resolved. Also, the Weibull based procedures should be 
applied to LANDSAT data in order to examine their 
performance on the types of asymmetry which will be 
encountered in practice. The fact that an additional 
parameter has been introduced into the model for each 
component has caused the estimation procedures to be slower 
than for the normal based procedures. Further investigation 
concerning the practical aspects of actually implementing 
the procedures is needed. 
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